ENH: Out-of-core architecture rewrite and filter optimizations by joeykleingers · Pull Request #1568 · BlueQuartzSoftware/simplnx

joeykleingers · 2026-03-24T18:07:36Z

Summary

This branch does two things at once:

Reworks the core out-of-core (OOC) data-storage architecture in simplnx so that disk-backed storage is supplied by a runtime-registered IO manager through clean interfaces, with the core library holding no reference to any concrete OOC format or symbol. Core gains a bulk-I/O data-store API (copyIntoBuffer / copyFromBuffer), an injectable store-format resolver, IO-manager lifecycle hooks, a tri-state storage-mode preference, a memory-budget manager, and memory-safety guards. The OOC implementation itself (chunked HDF5 stores, the chunk cache, deflate decompression) lives in a separate private SimplnxOoc source set and is not part of this diff — it plugs into core at runtime through these interfaces.
Adds chunk-aware (OOC-optimized) algorithm variants for ~50 filters across the SimplnxCore and OrientationAnalysis plugins, plus shared core utilities that benefit many more. Random-access algorithms that thrash the chunk cache on disk-backed arrays now either stream data with bulk I/O or dispatch to a sequential "scanline / CCL" variant, eliminating per-element virtual dispatch and HDF5 chunk churn. In-core behavior and performance are preserved via a runtime storage-type check, so optimized filters keep two code paths: one for in-memory data, one for disk-backed data.

Scope of this diff

develop…HEAD: 378 files, +42,853 / −13,112.

Area	Files	+ / −
`src/simplnx` (core library)	74	+6,874 / −1,252
`Plugins/SimplnxCore`	179	+24,817 / −8,795
`Plugins/OrientationAnalysis`	103	+8,379 / −2,853
Root / build / top-level tests	22	+2,783 / −212

How to review

The diff is large but highly patterned, and the two layers are independent:

Core architecture lives in src/simplnx/DataStructure/IO/Generic/*, Core/Preferences.*, Utilities/{MemoryBudgetManager,AlgorithmDispatch,SegmentFeatures,UnionFind,SliceBufferedTransfer}.*, and Filter/IFilter.cpp (Part 1).
Filter work is summarized in the inventory tables in Part 2. Every optimization is one of four patterns — reviewing one example of each pattern transfers to the rest. Each optimized algorithm's original file was renamed to the in-core variant and the OOC variant added beside it, so the diffs read as edits against the original code rather than wholesale new files.

Part 1 — Core out-of-core architecture (`src/simplnx`)

1.1 Bulk-I/O data-store API

copyIntoBuffer / copyFromBuffer on AbstractDataStore<T> (a flat range form and a tuple-addressed form) are the single mechanism for bulk reads/writes, implemented by DataStore<T> and EmptyDataStore<T>; the OOC implementation supplies its own override.
The storage-kind signal is IDataStore::StoreType { InMemory, OutOfCore, Empty }, queried via getStoreType(). The chunk-traversal API (loadChunk, getNumberOfChunks, getChunkLowerBounds, getChunkUpperBounds, getChunkShape) and the chunk-shape-based OOC detection are removed — bulk I/O plus getStoreType() replace them entirely.
Empty/placeholder string storage: EmptyStringStore, AbstractStringStore/StringStore placeholder support, and StringArray::isPlaceholder().

1.2 Runtime store-format resolution and IO-manager hooks

OOC capability is supplied entirely at runtime; core never names the concrete format.

IDataStoreFormatResolver: a const, thread-safe policy interface deciding which registered format a soon-to-be-created array uses ("" = in-memory). InMemoryFormatResolver is the default policy.
ArrayCreationUtilities::ResolveStorageFormat is the single decision point for every array-creation call site, with a fixed precedence: the unstructured/poly-geometry gate (ParentGeometrySupportsOoc) forces in-core, then an explicit per-filter format override, then the DataStructure's resolver. DataStructure carries a per-instance resolver plus a lazily-seeded process-wide default (neither serialized).
IDataIOManager lifecycle hooks (default no-ops): finalizesImport, onImportFinalize, onRecoveryWrite, onFinalizeStores, setBaseDirectory, shutdownManager. DataIOCollection fans these out to every registered manager, so an OOC manager participates in import finalization, recovery writes, store read-only transition, and shutdown without core knowing the specifics.
CoreDataIOManager: the always-present default manager that registers the in-memory data-store/list-store factories; an OOC manager registers on top at runtime.
Store creation goes through resolver-aware CreateDataStore / CreateListStore single entry points; CreateNeighborListAction / CreateNeighbors thread an explicit dataFormat override through to the store.

1.3 Storage-mode preference (+ legacy migration)

DataStorageMode { Adaptive, ForceInCore, ForceOutOfCore } (persisted as data_storage_mode) is the single source of truth for storage placement, replacing the large_data_format / force_ooc_data preferences. The enum is deliberately OOC-vocabulary-free — core states user intent; the registered manager maps it to a concrete format. dataStorageMode() migrates older preference files from the retained legacy keys; useOocData() is a convenience view (true unless ForceInCore).
Preferences seeds the OOC base directory and default format on startup via the IO-collection hooks, and gains a removeValue helper.

1.4 Memory budget + memory safety

MemoryBudgetManager: tracks the budget governing in-core vs OOC placement. defaultBudgetBytes() (50% of RAM, ≥1 GiB, clamped to the cap), maxBudgetBytes() (max(min(total−6 GiB, 0.95·total), 1 GiB)), and setBudgetBytes() which clamps to the cap and reports whether it clamped. Total-RAM detection is centralized in Memory::GetTotalMemory(). The default is clamped to the cap so it never exceeds it on <12 GiB machines (e.g. CI runners).
nxrunner --memory-budget <GB>: CLI flag wired through to the budget manager so the override takes effect in headless runs, with parsing hardened against NaN/inf/trailing garbage.
Memory-safety guards (additive to the existing -264 total-RAM hard block):
- -271 — non-blocking preflight warning when an in-core array would exceed currently-available RAM. OOC arrays are excluded (EmptyDataStore::memoryUsage() reports 0 for OOC placeholders; the format is resolved in preflight via the shared ResolveStorageFormat helper).
- -272 — a std::bad_alloc safety net at the single IFilter::execute → executeImpl boundary, turning an out-of-memory condition into a clean pipeline error instead of a crash.

1.5 HDF5 streaming write + compression

DatasetIO provides the streaming OOC write path — createEmptyDataset + writeSpanHyperslab for arrays too large to be resident — alongside the single-shot writeSpan; reads use readChunk / readChunkIntoSpan.
The streaming path honors requested compression: createEmptyDataset builds the chunked-deflate creation property list (BuildChunkedDeflateDcpl), so a large OOC array taking the two-step streaming write is compressed identically to the single-shot path. Contiguous storage is still used for compression level 0 and for arrays below the small-array threshold.

1.6 `.dream3d` loader API

Dream3dIO exposes a LoadDataStructure family: LoadDataStructure, LoadDataStructureMetadata (metadata-only for preflight), LoadDataStructureArrays (array subset), plus resolver-aware overloads that stamp a per-DataStructure resolver before import finalization (so a read-only visualization load can direct arrays to disk for fast first-show). Import is eager or deferred based on anyManagerFinalizesImport(); writes run under a recovery-write guard supplied by the registered manager.
ImportH5ObjectPathsAction uses this API: metadata-only on preflight, full load on execute, with a selective shortest-path-first merge of only the requested paths.

1.7 Core algorithm infrastructure

AlgorithmDispatch.hpp — DispatchAlgorithm<InCoreAlgo, OocAlgo>(arrays, args…), a free function template selecting between an in-core and an OOC algorithm class at runtime. Priority: ForceInCoreAlgorithm() > any array OOC > ForceOocAlgorithm() > in-core default. RAII test guards ForceOocAlgorithmGuard(bool) and ForceInCoreAlgorithmGuard let a single build exercise both paths.
SegmentFeatures OOC path — executeCCL(): Z-slice connected-component labeling with a 2-slice rolling buffer and UnionFind equivalence tracking (Face and FaceEdgeVertex connectivity, optional periodic BCs), replacing random-access BFS/DFS flood-fill on disk-backed data.
UnionFind — vector-based disjoint set with union-by-rank and path-halving.
SliceBufferedTransfer — type-dispatched Z-slice buffered tuple copy for morphological / neighbor-replacement transfer phases.
Extent — region/range math helper (with unit tests).
AlignSections OOC path — bulk slice read/write transfer for the align-sections family.

1.8 Core utility bulk-I/O conversions

These live in core utilities and benefit every caller; each is guarded by a runtime storage-type check that preserves the original in-core code path:

DataArrayUtilities — ImportFromBinaryFile, AppendData, CopyData, and the mirror swap_ranges ops route through chunked bulk I/O when OOC. (Powers ReadRawBinary and AppendImageGeometry's mirror.)
DataGroupUtilities::RemoveInactiveObjects — chunked featureIds renumbering via copyIntoBuffer/copyFromBuffer.
ClusteringUtilities::RandomizeFeatureIds — chunked bulk I/O (both overloads; benefits segmentation filters, SharedFeatureFace, MergeTwins).
GeometryHelpers — FindElementsContainingVert / FindElementNeighbors use 65K-element chunked passes with a current-chunk cache-hit check before falling back to per-element reads.
ImageRotationUtilities — Z-slab source cache for nearest-neighbor and a ±2-slice trilinear margin, sliding-window slab updates (memmove + delta reads), and intra-slice parallelism. This is how ApplyTransformationToGeometry and RotateSampleRefFrame get their OOC speedups (no plugin algorithm file changes for ApplyTransformation).
TriangleUtilities — bulk-load triangles/labels for winding repair.
H5DataStore — streaming row-batch FillOocDataStore replacing full-dataset allocation on import.
RectGridGeom / ImageGeom findElementSizes — route through the resolver-aware CreateDataStore (voxel-sizes array can go OOC); the RectGrid inner loop refactored to per-axis precompute + Z-slice copyFromBuffer.

Part 2 — Filter optimizations

Optimization patterns

Every filter optimization is one of these four shapes:

(a) Dispatch split — DispatchAlgorithm<…Direct, …Scanline>: an in-core Direct class (unchanged or parallel) and an OOC Scanline class that streams Z-slices / chunks. The original Foo.cpp becomes a thin dispatcher.
(b) CCL split — DispatchAlgorithm<…BFS, …CCL> (or an in-file branch): random-access flood-fill in-core, sequential Z-slice connected-component labeling for OOC.
(c) Single-implementation bulk I/O — one algorithm that reads/writes in chunks and caches feature/ensemble-level arrays in local std::vectors, with a runtime storage-type check so in-core stays optimal.
(d) Slice-buffered / safety — rolling Z-slice buffers via SliceBufferedTransfer, or an OOC-correctness guard (e.g. disabling threading for OOC stores) / progress-and-cancel additions.

Algorithm structure: in-core + out-of-core variants

For dispatch-split filters the original algorithm file is renamed to the in-core variant and the OOC variant is added beside it; the original filename remains as a thin dispatcher:

Filter	In-core variant	OOC variant
FillBadData	`FillBadDataBFS`	`FillBadDataCCL`
IdentifySample	`IdentifySampleBFS`	`IdentifySampleCCL`
ComputeBoundaryCells	`…Direct`	`…Scanline`
ComputeFeatureNeighbors	`…Direct`	`…Scanline`
ComputeSurfaceFeatures	`…Direct`	`…Scanline`
ComputeSurfaceAreaToVolume	`…Direct`	`…Scanline`
ComputeFeatureSizes	`…Direct`	`…Scanline`
MultiThresholdObjects	`…Direct`	`…Scanline`
DBSCAN	`…Direct`	`…Scanline`
ComputeKMedoids	`…Direct`	`…Scanline`
QuickSurfaceMesh	`…Direct`	`…Scanline`
SurfaceNets	`…Direct`	`…Scanline`
BadDataNeighborOrientationCheck (OA)	`…Worklist`	`…Scanline`
ComputeGBCDPoleFigure (OA)	`…Direct`	`…Scanline`

New shared header IdentifySampleCommon.hpp (SimplnxCore) provides the VectorUnionFind and per-slice functor shared by the BFS/CCL variants; TupleTransfer.hpp gains quickSurfaceTransferBatch / surfaceNetsTransferBatch bulk APIs used by the mesh Scanline variants.

SimplnxCore inventory

Filter	Pattern	Technique
FillBadData	(b)	`DispatchAlgorithm<FillBadDataBFS, FillBadDataCCL>`; CCL streams Z-slices with core `UnionFind`
IdentifySample	(b)	`DispatchAlgorithm<IdentifySampleBFS, IdentifySampleCCL>`; scanline labeling via `VectorUnionFind`
ScalarSegmentFeatures	(b)	In-file OOC branch to `SegmentFeatures::executeCCL()` (Z-slice CCL)
ComputeBoundaryCells	(a)	Scanline Z-slice rolling-window neighbor reads
ComputeSurfaceFeatures	(a)	Scanline Z-slice rolling window
ComputeFeatureNeighbors	(a)	Scanline Z-slice rolling window
ComputeSurfaceAreaToVolume	(a)	Scanline Z-slice rolling window
ComputeFeatureSizes	(a)	Direct = tbb parallel Kahan accumulation; Scanline = chunked `copyIntoBuffer` (256K-tuple)
MultiThresholdObjects	(a)	Scanline eliminates the O(n) temp result vector
DBSCAN	(a)	Chunked grid build + on-demand per-cell coordinate reads in `canMerge`
ComputeKMedoids	(a)	Chunked `findClusters`; bounded per-cluster peak memory
QuickSurfaceMesh	(a)	Scanline drops the O(volume) `nodeIds` for rolling 2-plane node buffers; `quickSurfaceTransferBatch`
SurfaceNets	(a)	Scanline reimplements with a hash-map surface store; `surfaceNetsTransferBatch`
ComputeFeatureCentroids	(c)	Plain `std::vector` accumulators (no DataStore); 64K-tuple chunked featureIds
ComputeFeatureClustering	(c)	Feature-level array caching; RDF in local vectors
ComputeEuclideanDistMap	(c)	Bulk-read into local vectors, flood-fill in RAM, bulk-write
ComputeFeaturePhases	(c)	65K-tuple chunked featureIds + cellPhases; feature-level vectors replace per-cell map
RequireMinimumSizeFeatures	(c)	Chunked feature removal + 3-slice rolling slab for `assignBadVoxels` voting; sparse changed-voxel tracking
CropImageGeometry	(c)	K=32 Z-slice batched slab I/O (O(n^⅔) working set)
ExtractInternalSurfacesFromTriangleGeometry	(c)	`triMask` bitset + sparse `triPrefixSum` popcount (~6.4× less memory); 65K-element streamed passes
ComputeTriangleAreas	(c)	Filter-level chunked triangle connectivity + span-bounded vertex loads; parallel compute on local buffers
RegularGridSampleSurfaceMesh	(c/d)	`ZSliceWorker` parallel Z-slice rasterize → mutex-guarded bulk `copyFromBuffer`
ErodeDilateBadData	(d)	`SliceBufferedTransfer` per-Z commit + Z-slice neighbor reads
ErodeDilateCoordinationNumber	(d)	`SliceBufferedTransfer` per-Z commit
ErodeDilateMask	(c/d)	Slice-based bulk mask erosion/dilation
ReplaceElementAttributesWithNeighborValues	(d)	`SliceBufferedTransfer` per-Z best-neighbor commit
AlignSectionsFeatureCentroid	(d)	In-file OOC branch `findShiftsOoc()` with per-Z-slice mask reads
WriteAvizoRectilinearCoordinate / WriteAvizoUniformCoordinate	(c)	Bulk `copyIntoBuffer` for coordinate/data output
ComputeArrayStatistics	(d)	OOC-safety: parallelization disabled when stores are OOC
ReadHDF5Dataset / ReadStlFile	(d)	Cancel checks + throttled progress

OrientationAnalysis inventory

Filter	Pattern	Technique
ComputeIPFColors	(a)	`DispatchAlgorithm<…Direct, …Scanline>`; Direct keeps parallel in-core, Scanline 65K-tuple chunks + cached crystal structures; color key forwarded to the OOC path
ComputeGBCDPoleFigure	(a)	Dispatched at the filter level; Scanline caches only the phase-of-interest GBCD slice
BadDataNeighborOrientationCheck	(a)	`DispatchAlgorithm<…Worklist, …Scanline>`; bool-mask reads routed through bulk I/O
EBSDSegmentFeatures	(b)	`SegmentFeatures::executeCCL()` slice-by-slice, replacing DFS flood-fill
CAxisSegmentFeatures	(b)	Same CCL architecture as EBSDSegmentFeatures
AlignSectionsMisorientation	(c)	`findShiftsOoc()`, 2-slice buffered quats/phases/mask + cached crystal structures
AlignSectionsMutualInformation	(c)	Per-slice bulk reads of phases/quats/mask + cached crystal structures
NeighborOrientationCorrelation	(c/d)	3-slice rolling window via `SliceBufferedTransfer`; cached crystal structures
ComputeAvgOrientations	(c)	Chunked featureIds/phases/quats; cached crystal structures + avgQuats; bulk write
ComputeFeatureReferenceMisorientations	(c)	Chunked cell-level arrays; cached crystal structures, avgQuats, center quats
ComputeKernelAvgMisorientations	(c)	Per-Z-plane `[plane−kZ, plane+kZ]` slab reads; cached crystal structures
ComputeAvgCAxes	(c)	4096-tuple chunked reads; feature-level avgCAxes + crystal structures cached
ComputeCAxisLocations	(c)	64K-tuple chunked read/process/write; cached crystal structures
ComputeFeatureReferenceCAxisMisorientations	(c)	Z-slice buffered cell-level I/O; cached crystalStructures + avgCAxes
ComputeFeatureNeighborCAxisMisalignments	(c)	Bulk-read feature-level arrays; buffered output
ComputeTwinBoundaries	(c)	Bulk-read all face/feature/ensemble arrays into local vectors
MergeTwins	(c)	Chunked voxel parent-ID fill + assignment; feature-level parentIds cached
ConvertOrientations	(c)	Macro-generated convertors process each range in 4096-tuple chunks (bulk read→convert→bulk write)
RotateEulerRefFrame	(c)	64K-tuple in-place read-modify-write chunks
ComputeGBCD	(c)	Feature-level caching; chunked 50K-triangle reads; GBCD accumulated locally then `copyFromBuffer`
ComputeGBCDMetricBased	(c)	Per-chunk sequential area accumulation (no O(n) `triIncluded`); feature-level caching; raw-pointer parallel selector
ComputeGBPDMetricBased	(c)	Caches feature eulers/phases + ensemble crystal structures; chunked triangle reads (distinct filter from GBCD MetricBased)
WriteGBCDGMTFile	(c)	Caches phase-of-interest GBCD slice; crystal structures cached
WriteGBCDTriangleData	(c)	8K-triangle chunked reads; feature-level Euler cache; output buffered via `fmt::memory_buffer`
ReadAngData / ReadCtfData	(c)	Bulk `copyFromBuffer` for all cell arrays; chunked Euler interleave; in-place phase validation
ReadH5Ebsd	(c)	`copyFromBuffer` in CopyData template, phase copy, Euler interleave
ReadH5EspritData	(c)	Bulk `copyFromBuffer` from raw HDF5 reader buffers
WritePoleFigure	(c)	Per-phase chunked input reads with bounded buffers; bulk `copyFromBuffer` for intensity/image outputs

Cancel + progress

In-core and OOC variants gained m_ShouldCancel checks at the top of major loops and ThrottledMessenger-based progress reporting with per-phase messages and percent-complete.

Part 3 — Performance results

All benchmarks on an arm64 Release build forcing the out-of-core path (DataStorageMode::ForceOutOfCore, or ForceOocAlgorithmGuard in tests).

Mesh Generation Filters (full ctest wall-clock, OOC build)

Test	Before (s)	After (s)	Speedup
QuickSurfaceMesh: Base	11.30	0.19	59x
QuickSurfaceMesh: Winding	22.70	0.22	103x
QuickSurfaceMesh: Problem Voxels	11.18	0.19	59x
QuickSurfaceMesh: Winding+PV	21.96	0.22	100x
SurfaceNets: Default	176	2.40	73x
SurfaceNets: Smoothing	224	2.62	85x
SurfaceNets: Winding	515	2.86	180x
SurfaceNets: Winding Smoothing	416	3.22	129x

Groups B–E (200³ dataset, filter.execute() only)

Filter	Before (s)	After (s)	Speedup
ComputeBoundaryCells	6.69	0.25	27x
ComputeSurfaceFeatures	4.01	0.28	14x
ComputeFeatureNeighbors	8.93	0.81	11x
ComputeSurfaceAreaToVolume	8.59	0.24	36x
BadDataNeighborOrientationCheck	97.1	5.25	18x
ErodeDilateBadData	25.09	3.80	7x
ErodeDilateCoordinationNumber	12.43	2.30	5x
ErodeDilateMask	6.43	0.40	16x
ReplaceElementAttrsWithNeighborValues	6.05	4.00	1.5x
NeighborOrientationCorrelation	67.94	5.70	12x
ScalarSegmentFeatures	708.3	1.77	400x
EBSDSegmentFeatures	972.6	2.10	463x
CAxisSegmentFeatures	824.1	1.39	593x
FillBadData	8.6	2.26	4x
IdentifySample	825.0	0.27	3056x
AlignSectionsMisorientation	32.89	0.80	41x
AlignSectionsMutualInformation	15.61	0.81	19x
AlignSectionsFeatureCentroid	8.41	0.39	22x
AlignSectionsListFilter	7.50	0.39	19x

Pipeline-Critical Filters (filter.execute() only, OOC build)

Filter	Before	After	Speedup
ComputeFeatureCentroids	39.7s	25ms	1,589x
RequireMinimumSizeFeatures	20.2s	210ms	96x
ComputeIPFColors	1.94s	90ms	21.5x
ComputeFeatureSizes	813ms	28ms	29x
ComputeFeatureReferenceMisorientations (AvgOri)	106ms	1ms	106x
ComputeFeatureReferenceMisorientations (EuclDist)	136ms	1ms	136x

OrientationAnalysis Filters (full ctest wall-clock, OOC build)

Filter	Before (s)	After (s)	Speedup
ComputeFeatureReferenceCAxisMisorientations	196	5.4	36x
ComputeEuclideanDistMap	116	1.1	105x

GBCD Filter Group (full ctest wall-clock)

Filter	Before (s)	After (s)	Speedup
ComputeGBCDPoleFigure	833 (fail)	2.4	350x
ComputeGBCD	1500 (timeout)	~10	150x
WriteGBCDGMTFile	162 (fail)	6.0	27x
ComputeGBCDMetricBased	38.1	28.9	1.3x
WriteGBCDTriangleData	23.5	19.2	1.2x

HDF5 Import + Pole Figure Filters (full ctest wall-clock)

Filter	Before (s)	After (s)	Speedup
WritePoleFigure (3 tests)	4500 (timeout)	11.7	385x
ReadH5EspritData (3 tests)	2060 (timeout)	6.8	303x
ReadHDF5Dataset	1500 (timeout)	6.7	224x

Additional Optimizations (full ctest wall-clock)

Filter	Before (s)	After (s)	Speedup
ReadRawBinary (Case1)	1076	29	37x
ComputeGBCDPoleFigure	853	0.9	948x
DBSCAN 3D	653	12	54x
AlignSectionsMisorientation Pipeline	635	5.9	107x
ReadH5Ebsd	463	2.1	220x
ReadCtfData	231	0.25	924x
AppendImageGeometry	469	113	4.2x
ComputeFeatureClustering	203	77	2.6x
ComputeTwinBoundaries	179	44	4x
MergeTwins	67	1.8	37x
ComputeKMedoids	74	13	5.7x
CropImageGeometry (X)	27	2.6	10x
WriteAvizoRectilinear	22.8	2.3	10x
WriteAvizoUniform	22.3	2.0	11x

Geometry / Mesh / Phase Filters

Filter	Before	After	Note
ApplyTransformationToGeometry (trilinear, CT_align 1.97 B-voxel)	133s	20s	~6.6x; via `ImageRotationUtilities` slab cache
ComputeTriangleAreas (CT_align mesh)	26s	<1s	~26x
ComputeFeaturePhases (748,800 cells, disk-backed)	11.7s	35ms	feature-level vectors + chunked reads
ComputeGBPDMetricBased	20.5s	6.3s	~3.3x; feature/ensemble caching
ExtractInternalSurfacesFromTriangleGeometry	—	—	~6.4x less triangle-side memory (bitset + popcount)

Part 4 — Test infrastructure

Comparison functions rewritten for chunked bulk I/O (UnitTestCommon.hpp): CompareDataArrays, CompareDataArraysByComponent, CompareArrays, CompareFloatArraysWithNans stream both arrays in 40K-element chunks via copyIntoBuffer instead of per-element operator[] (per-element access on OOC arrays is pathologically slow).
New store-type helpers: ExpectedStoreType() (derives the expected StoreType from the active DataStorageMode + whether an OOC manager is registered) and RequireExpectedStoreType().
PreferencesSentinel takes a DataStorageMode and restores the original preferences on destruction.
TestFileSentinel reference-counts archive extraction via per-process holder files, so parallel test runs don't delete shared decompressed data prematurely.
LoadDataStructure test helper calls DREAM3D::LoadDataStructure(path) directly.
New SegmentFeaturesTestUtils.hpp (~638 lines): shared builders/verifiers for the SegmentFeatures family.
Dual-path testing: ForceOocAlgorithmGuard + GENERATE(from_range(k_ForceOocTestValues)) runs each case in both in-core and forced-OOC modes — adopted by 23 plugin test files (16 SimplnxCore, 7 OrientationAnalysis).
8 new top-level tests in test/: DataStoreFormatResolverTest, DataStorageModeMigrationTest, Dream3dLoadingApiTest, EmptyStringStoreTest, ExtentTest, MemoryBudgetManagerTest, MemorySafetyTest, IParallelAlgorithmTest.
RotateSampleRefFrame / RotateEulerRefFrame test paths use slab/chunked bulk I/O.

Part 5 — Build system

OOC is a runtime capability — it is selected entirely by the registered IO manager; there is no compile-time OOC switch. cmake/SimplnxConfig.hpp.in is a generated PUBLIC, ODR-safe config header.
SIMPLNX_TEST_ALGORITHM_PATH cache option (default 0): 0=Both, 1=OOC-only, 2=InCore-only; plumbed into every plugin test as a compile definition (cmake/Plugin.cmake). An in-core build can validate both algorithm paths (forcing the Scanline path against in-core data still runs correctly and fast).
SIMPLNX_UNIT_TEST_TARGETS global property — create_simplnx_plugin_unit_test records each test target so consumer (add_subdirectory) builds can attach extra sources/settings.
zlib as a direct vcpkg dependency — the OOC layer compiled into consumers uses it directly for parallel deflate-chunk decompression off the global HDF5 mutex (and UnitTestCommon uses it for the in-house tar.gz extractor).
/bigobj for the MSVC build of Dream3dIO.cpp (exceeds the COMDAT limit / C1128 in Debug).
New core headers/sources registered in CMake (Extent, EmptyStringStore, IDataStoreFormatResolver, InMemoryFormatResolver, MemoryBudgetManager, AlgorithmDispatch, SliceBufferedTransfer, UnionFind, IdentifySampleCommon).

Part 6 — Documentation

48 filter docs updated under src/Plugins/*/docs/ (27 SimplnxCore, 21 OrientationAnalysis; ~+1,177 lines), each adding an ## Algorithm section with ### Performance and paired In-Core / Out-of-Core subsections that explain the dual implementation, memory footprint, and chunk/slab streaming strategy. No docs deleted.

Part 7 — Test data archives

Three download_test_data() entries added:

fill_bad_data_exemplars.tar.gz (SimplnxCore)
identify_sample_exemplars.tar.gz (SimplnxCore)
segment_features_exemplars.tar.gz (referenced by both SimplnxCore and OrientationAnalysis)

No existing archive entries were removed.

Related PR

BlueQuartzSoftware/DREAM3DNX#1121 — DREAM3D-NX OOC visualization architecture; consumes the runtime resolver/loader APIs added here and depends on this PR merging first.

Test Plan

In-core build (SIMPLNX_TEST_ALGORITHM_PATH=2) passes
Out-of-core build (SIMPLNX_TEST_ALGORITHM_PATH=1) passes
Both algorithm paths (SIMPLNX_TEST_ALGORITHM_PATH=0) pass
All optimized filters produce identical results on both algorithm paths
In-core performance verified: no regression on the core-utility changes (CopyData, AppendData, mirror swaps)

* Rewrite the markdown Algorithm section to explain the crop as a 3D subarray copy from first principles, teach the Z-slice-batched bulk I/O strategy step-by-step, and quantify why batching by K Z-slices collapses HDF5 chunk-op overhead * Add a Doxygen block on CropImageGeomDataArray describing the per-pass pipeline (bulk read slab -> in-memory extract -> bulk write) and the O(slab), non-O(volume) memory bound Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

Rewrite the Algorithm section so a reader unfamiliar with the filter can follow the two-phase pipeline end-to-end: * Phase 1 (feature removal): motivate why small features get pruned, describe the 64K-tuple chunked scan, and explain the "skip write when chunk unchanged" optimization * Phase 2 (gap fill by majority-vote): teach the rolling 3-slice buffer scan, the sparse parallel vectors that replace the old O(n) dense index array, the per-array ChunkedTransferWorker with its +/-1 Z-margin slab read + interior-only write-back, and the outer ParallelTaskAlgorithm across arrays * Add a memory-footprint summary clarifying that every data structure is O(slice) or O(iteration bad count), never O(volume) Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

Add a new Algorithm section that teaches the filter from scratch: * Explain conceptually which triangles are kept (all three vertices inside the user-specified node-type range) and what the output geometry looks like (compact vertex list, compact triangle list, remapped connectivity) * Document the downstream-invariant that forces vertNewIndex to stay a dense per-vertex map (triangle 0's three fresh vertices land at new indices 0..2 in traversal order) * Explain the triMask bitset + triPrefixSum sparse popcount table that replaces the legacy dense triangle map for ~6.4x memory savings, and how remapIndex() turns an O(1) table lookup plus a small popcount into each triangle's compact new index * Walk the six streaming passes (vertex-ok mask, triangle scan + vertex-index assignment, prefix-sum build, vertex copy, triangle remap copy, per-vertex/per-triangle attached-array copy) * Summarize the memory footprint so the vertNewIndex dominance is clear on very large meshes Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

Add a comprehensive Algorithm section covering both the node-geometry and image-geometry paths from first principles: * Describe how every supported transform (rotation, scale, manual matrix, etc.) collapses to a single 4x4 homogeneous matrix M and how M composes with prior transforms * Node geometries: walk the 16K-vertex chunked read -> multiply -> write pipeline and explain why in-place topology+attribute data is correct * Image geometries: teach the re-gridding problem (why output voxels need to look up source values via M^-1), and contrast nearest- neighbor vs. trilinear interpolation * Z-slice slab cache: analytically deriving the per-output-slice source-Z range and the +/-2 trilinear margin * Sliding-window slab updates via memmove + delta copyIntoBuffer reads when consecutive output slices overlap heavily * Intra-slice parallelism via ParallelDataAlgorithm with thread safety argued from shared-read + disjoint-write access patterns and per-thread pValues scratch Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

Add an Algorithm section that walks the chunked pipeline step-by-step for a reader unfamiliar with the optimization: * Establish the closed-form per-triangle math (0.5 * |(A-B) x (A-C)|) so there is no confusion about the compute * Quantify the naive access pattern (six OOC chunk-cache hits per triangle, hundreds of millions of virtual dispatches on CT-scale meshes) to motivate the chunking * Walk the five-step per-chunk pipeline: bulk triangle connectivity read -> analyze vertex-index span -> span-bounded bulk vertex coords read -> parallel compute on plain buffers -> bulk area write * Explain the 16M-vertex span cap and the serial per-triangle fallback for pathological meshes * Summarize memory footprint (bounded O(chunk), not O(mesh)) Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

Rewrite the Algorithm section to fully teach the filter: * State what the three output arrays (NumElements, Volume, EquivalentDiameter) represent and show the spherical/circular diameter formulas * Image Geometry path: explain the uniform-voxel-volume shortcut that lets the filter skip per-voxel volume computations, then walk the 256K-tuple chunked count pass and the per-feature output pass; cover the 2D fallback rules and the two-empty-dimensions preflight error * RectGrid path: contrast with the Image case, describe the lockstep FeatureIds + elementSizes chunked read, and explain why Kahan summation is needed to avoid float32 rounding error on billion-voxel volumes * Justify the 256K chunk size choice based on HDF5 chunk-lookup overhead vs. L2 cache residency * Summarize memory footprint Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

…x bool-mask bulk I/O Three logically related changes that finish reconciling the rebased branch with Nathan Young's PR BlueQuartzSoftware#1590 (ENH: Standardize 2D Image Handling) and fix one resulting OOC perf cliff: 1. Wholesale port of PR BlueQuartzSoftware#1590's two algorithm rewrites into the renamed in-core dispatch variants: - ComputeFeatureNeighborsDirect.cpp gets Nathan's templated ComputeFeatureNeighborsFunctor<ImageDimensionStateT> and ProcessVoxels dispatcher in place of the OOC-commit-era custom in-core logic. - IdentifySampleBFS.cpp gets Nathan's templated IdentifySampleFunctor plus the corresponding ProcessVoxels dispatch. The Scanline OOC variant of ComputeFeatureNeighbors is updated to reference the namespaced VoxelNeighbors<Image3D>:: constants while preserving its Z-slice rolling-window bulk-I/O structure. 2. Reapply PR BlueQuartzSoftware#1590's constexpr/const cleanups across the algorithm files where the rebase took --theirs (the OOC commit version) at the 2aa00ee conflict and dropped Nathan's small adjustments: SimplnxCore: ComputeBoundaryCellsDirect, ErodeDilateBadData, ErodeDilateCoordinationNumber, ErodeDilateMask, ReplaceElementAttributesWithNeighborValues, RequireMinimumSizeFeatures OrientationAnalysis: BadDataNeighborOrientationCheckWorklist, NeighborOrientationCorrelation The pattern is uniform: promote the inlined `6` neighbor-array sizes to use VoxelNeighbors<Image3D>::k_FaceNeighborCount via a local k_NumFaceNeighbors alias, make neighborVoxelIndexOffsets const, make faceNeighborInternalIdx constexpr, make isValidFaceNeighbor const where it is not mutated, drop the now-unused DataGroup.hpp include, and const-ify NeighborOrientationCorrelation's orientationOps. ComputeFeatureNeighborsFilter.md picks up Nathan's all-dimension note about user-set spacing for shared surface area calculation. 3. Fix a per-element OOC fallback in BadDataNeighborOrientationCheckScanline that was triggered whenever the input mask was a BoolArray rather than a UInt8Array. The previous code routed bool masks through maskCompare->isTrue / maskCompare->setValue per voxel per Z-slice, causing chunk thrashing under chunked OOC storage. The Small_IN100 pipeline test (a 189x201x117 volume with a bool mask produced by MultiThresholdObjects) ran in 4.7 s on simplnx-Rel but 3+ minutes on simplnx-ooc-Rel. AbstractDataStore<bool> already exposes copyIntoBuffer/copyFromBuffer just like AbstractDataStore<uint8>; the comment claiming otherwise was stale. Resolve a typed AbstractDataStore<bool>* alongside the existing uint8 store pointer and route both load and write-back through bulk I/O, with a small per-slice std::unique_ptr<bool[]> scratch buffer bridging between the algorithm's uint8 slice buffers and the bool data store's typed bulk API. With this change Small_IN100 OOC drops to 4.6 s (~1.6x in-core, in line with normal OOC overhead). Tests updated: - IdentifySampleTest.cpp adopts Nathan's PR BlueQuartzSoftware#1590 hand-built 2D Empty Z/Y/X Non-Square regression tests plus the parameterized identify_sample_v2 exemplar test and the SIMPL Backwards Compatibility test, all wrapped with the OOC dual-path pattern (ForceOocAlgorithmGuard + GENERATE(from_range(k_ForceOocTestValues))). The pre-existing 200x200x200 large-scale OOC validation test is retained. Verified: simplnx-Rel and simplnx-ooc-Rel preset builds both clean. All 43 affected-filter tests pass on simplnx-Rel; all 86 affected-filter tests pass on simplnx-ooc-Rel (regex covering ComputeFeatureNeighbors, IdentifySample, BadDataNeighborOrientation, ComputeBoundaryCells, ErodeDilate*, NeighborOrientationCorrelation, ReplaceElementAttributesWithNeighborValues, RequireMinimumSizeFeatures).

* Replace CreateDataStore + CreateResolvedDataStore with a single resolver-aware CreateDataStore(DataStructure, DataPath, ...) that always consults the registered format resolver. Old explicit-format overload deleted. * Replace CreateListStore similarly so NeighborList backing storage is OOC-eligible when the OOC plugin is loaded and thresholds permit. * Inline action-layer caller in ArrayCreationUtilities::CreateArray using GetIOCollection().createDataStoreWithType directly. * Migrate 23 CreateResolvedDataStore call sites (mechanical rename). * Migrate 13 cell-level test fixtures that were silently in-memory in OOC builds to the resolver-aware path so OOC builds actually exercise OOC stores. * Migrate 6 in-memory non-test callers (ComputeFeatureCentroids scratch buffers, HDF5 readers in DataStoreIO and DatasetIO) to direct std::make_shared<DataStore<T>> since they have no DataStructure context. * Migrate 2 NeighborListIO HDF5 readers to std::make_shared<ListStore<T>> for the same reason (in-core branch of the import pipeline). * Wire CreateNeighbors action helper through the resolver-aware CreateListStore. * Rewrite IOFormat.cpp tests to exercise the resolver path. ImageGeom and RectGridGeom findElementSizes now route through the new CreateDataStore so the voxel-sizes array can go OOC for very large structured grids. RectGridGeom's inner loop also refactored from per-voxel setValue calls to per-axis precompute + Z-slice copyFromBuffer to avoid catastrophic OOC perf when the array is OOC-backed. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

… Format tests * DataIOCollection's constructor registers the OOC format via SimplnxOoc::registerIOManager under SIMPLNX_USE_OOC, so getManager("HDF5-OOC") resolves in the compile-time-switch OOC build. * IOFormat: guard the in-core large-data-format preference tests to #ifndef SIMPLNX_USE_OOC (the OOC-build defaults are covered by SimplnxOoc's DataFormatPreferenceTest) and update the "not configured" assertion to the seeded k_InMemoryFormat default. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

The rebase onto upstream/develop brought in the parallelized in-core ComputeFeatureSizes (tbb::combinable thread-local accumulation), while the OOC branch had replaced that loop with serial chunked bulk I/O. Rather than discard either, split the algorithm into the established Direct/Scanline dispatch pattern so each storage backing uses its optimal strategy. * ComputeFeatureSizesDirect: in-core parallel accumulation (the upstream ParallelDataAlgorithm + tbb::combinable Kahan-summation implementation) * ComputeFeatureSizesScanline: out-of-core chunked copyIntoBuffer streaming (renamed from the former single ComputeFeatureSizes implementation) * ComputeFeatureSizes: thin DispatchAlgorithm<Direct, Scanline> dispatcher selecting on whether the FeatureIds array is out-of-core * Register both new algorithm units in the SimplnxCore CMakeLists * Exercise both paths in the existing tests via ForceOocAlgorithmGuard + GENERATE(from_range(k_ForceOocTestValues)) Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

The in-core build previously forced SIMPLNX_TEST_ALGORITHM_PATH to InCoreOnly whenever SIMPLNX_USE_OOC was OFF, on the assumption that no out-of-core paths exist to test. That is no longer true: the Direct/Scanline dispatch classes are always compiled into the plugins, and forcing the Scanline (OOC) path runs it against in-core data via copyIntoBuffer (a plain std::copy here), staying fast while verifying the OOC algorithm matches the in-core result. * Only coerce the nonsensical OocOnly (1) to InCoreOnly (2) when OOC is off * Allow Both (0) so a single in-core build can validate both algorithm paths Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

DatasetIO::createEmptyDataset created the dataset with no creation property list, so out-of-core arrays large enough to take the two-step streaming write (createEmptyDataset + hyperslab writes) were always written contiguous and uncompressed, even when WriteOptions requested compression. The single-shot writeSpan path already applied it. * Build the dataset creation property list via BuildChunkedDeflateDcpl in createEmptyDataset, matching writeSpan * Preserves the existing fall-throughs to contiguous storage for compression level 0 and for arrays below the small-array threshold Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

The "Recovery file with all in-core data" test hardcoded StoreType::InMemory and relied on ambient preferences, so it failed whenever forceOocData was set: under forceOoc, the recovery file's inline arrays correctly load as out-of-core stores backed by the recovery file itself. * Assert the expected store type via RequireExpectedStoreType, which tracks the active large-data preferences (OutOfCore under forceOoc, InMemory otherwise) * Correct stale comments that claimed OOC was not compiled in Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

Add MemoryBudgetManager::maxBudgetBytes() (max(min(total-6GiB, 0.95*total), 1GiB)) and make setBudgetBytes() clamp the upper bound and report whether it clamped. Deduplicate the platform total-RAM ifdef through Memory::GetTotalMemory(). Apply the --memory-budget override to the manager in nxrunner so the cap and the override actually take effect in headless runs (previously the override was written only to Preferences, which the manager never reads in CLI mode). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the SIMPLNX_USE_OOC compile-time switch and the format/preference plumbing that depended on it with runtime interfaces: an injectable store-format resolver, IO-manager lifecycle hooks fanned out by DataIOCollection, and a tri-state DataStorageMode preference. simplnx core no longer references any OOC symbol or on-disk format name; out-of-core capability is supplied entirely by a registered IO manager at runtime. 50 files changed, +1175 / -643 lines. ================================================================================ 1. Store-format resolver abstraction ================================================================================ Files: src/simplnx/DataStructure/IO/Generic/IDataStoreFormatResolver.{hpp,cpp}, InMemoryFormatResolver.hpp, src/simplnx/Utilities/ArrayCreationUtilities.{hpp,cpp}, DataStoreUtilities.hpp, DataArrayUtilities.hpp, DataStructure.{hpp,cpp}, Filter/Actions/CreateNeighborListAction.{hpp,cpp} Introduce IDataStoreFormatResolver, a const, thread-safe policy interface that decides which registered format a soon-to-be-created array uses, returning "" for the in-memory default. InMemoryFormatResolver is the trivial default policy. ArrayCreationUtilities::ResolveStorageFormat becomes the single decision point shared by every creation call site, applying a fixed order: the authoritative unstructured/poly-geometry gate (ParentGeometrySupportsOoc) forces in-core, then an explicit per-filter override wins, then the DataStructure's resolver decides. DataStructure carries a per-instance resolver plus a lazily-seeded process-wide default, neither serialized. CreateArray, CreateListStore, and CreateNeighborListAction route through this helper; CreateNeighborListAction and CreateNeighbors now thread an explicit dataFormat override through to the store. ================================================================================ 2. Runtime IO-manager lifecycle hooks and DataIOCollection fan-out ================================================================================ Files: src/simplnx/DataStructure/IO/Generic/IDataIOManager.hpp, DataIOCollection.{hpp,cpp}, IO/HDF5/DataStructureWriter.{hpp,cpp}, DataStructure/StringArray.{hpp,cpp} Add no-op virtual lifecycle hooks to IDataIOManager (finalizesImport, onImportFinalize, onRecoveryWrite, onFinalizeStores, setBaseDirectory, shutdownManager) so an OOC manager can participate in import finalization, recovery writes, store read-only transition, and shutdown without core knowing the specifics. DataIOCollection aggregates these: finalizeStores now fans out to every manager's hook instead of forwarding to a compiled-in SimplnxOoc call, and new anyManagerFinalizesImport / onImportFinalize / onRecoveryWrite / setBaseDirectory / shutdownManagers dispatch to the registered managers. The HDF5 writer's recovery-write path calls the collection hook rather than a direct SimplnxOoc function. StringArray gains an isPlaceholder() override. ================================================================================ 3. DataStorageMode tri-state preference with legacy migration ================================================================================ Files: src/simplnx/Core/Preferences.{hpp,cpp} Replace the largeDataFormat / forceOocData preference surface with a single canonical DataStorageMode enum (Adaptive, ForceInCore, ForceOutOfCore) persisted as an integer under data_storage_mode. The enum is deliberately OOC-vocabulary- free: core states user intent, the OOC build maps it onto a concrete format. dataStorageMode() is the single source of truth and migrates older preference files from the retained legacy keys; useOocData() becomes a convenience view (true unless ForceInCore). The cached m_UseOoc flag and checkUseOoc() are removed. ================================================================================ 4. Remove the SIMPLNX_USE_OOC compile-time switch from core ================================================================================ Files: CMakeLists.txt, cmake/SimplnxConfig.hpp.in, IO/HDF5/DataStoreIO.hpp, Utilities/Parsing/DREAM3D/Dream3dIO.cpp Drop the SIMPLNX_USE_OOC option, the SIMPLNX_OOC_SOURCE_DIR compile-in of the private SimplnxOoc sources, and the OOC test-suite wiring from CMake. The generated config header no longer defines the macro. All previously #ifdef'd creation, spill-to-disk, and import code paths are now unconditional and route through the runtime interfaces; the import path decides eager-vs-deferred load via anyManagerFinalizesImport() instead of the macro. ================================================================================ 5. Resolver-aware load overloads ================================================================================ Files: src/simplnx/Utilities/Parsing/DREAM3D/Dream3dIO.{hpp,cpp} Add LoadDataStructure and LoadDataStructureArrays overloads that stamp a per-DataStructure resolver before import finalization runs, so a caller (e.g. a read-only visualization load) can direct arrays to disk-backed stores for fast first-show. nullptr matches the existing no-resolver behavior. ================================================================================ 6. Test migration and new coverage ================================================================================ Files: test/DataStoreFormatResolverTest.cpp, test/DataStorageModeMigrationTest.cpp, test/CMakeLists.txt, test/IOFormat.cpp, test/Dream3dLoadingApiTest.cpp, test/UnitTestCommon/UnitTestCommon.{hpp,cpp}, and the SimplnxCore / OrientationAnalysis filter tests. Migrate every test from setForceOocData/setLargeDataFormat to setDataStorageMode / DataStorageMode. PreferencesSentinel takes a DataStorageMode; the UnitTestCommon load helper consults dataStorageMode() and anyManagerFinalizesImport(). New tests cover the resolver (InMemoryFormatResolver, per-instance vs process-default isolation, ParentGeometrySupportsOoc) and the legacy-key migration. ================================================================================ Verification ================================================================================ No build or test run was performed as part of this squash. The squash is verified content-faithful: the squashed commit's tree is confirmed identical to the original range tip (step 8).

…d_alloc -272 net) Add a non-blocking preflight warning (-271) when an in-core array would exceed currently-available RAM (OOC arrays excluded; EmptyDataStore::memoryUsage() now reports 0 for out-of-core placeholders, format resolved in preflight via a shared ResolveStorageFormat helper), and a std::bad_alloc safety net (-272) at the single IFilter::execute -> executeImpl boundary so an out-of-memory condition is a clean pipeline error instead of a crash. Additive to the existing -264 total-RAM hard block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

After the out-of-core resolver refactor (f68689d), Dream3dIO.cpp exceeds MSVC's default COMDAT section limit and fails to compile in Debug with C1128. /bigobj raises the limit; Release was unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the inline "Automatic" / "In Memory" strings in DataIOCollection with k_AutomaticDisplayName / k_InMemoryDisplayName constants so the labels have a single definition. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

SliceBufferedTransfer.hpp and SimplnxCore's IdentifySampleCommon.hpp were added on this branch but never listed in any CMake, so they were invisible in the IDE. Add them to SIMPLNX_HDRS and PLUGIN_EXTRA_SOURCES respectively, where the existing source_group(TREE ...) auto-groups them (simplnx/Utilities, Filters/Algorithms). Also lift simplnx_test's inline source list into a variable and source_group it under "test" / "Generated", mirroring the plugin-test convention in cmake/Plugin.cmake. No build/behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

zlib was previously present only transitively, through the hdf5 dependency's "zlib" feature. The out-of-core layer that consumers compile into libsimplnx (via SimplnxOoc's OOC.cmake) now uses zlib directly: DeflateChunkLoader inflates raw HDF5 deflate chunks with uncompress() so chunk decompression can run in parallel, off the global HDF5 mutex, and OOC.cmake links ZLIB::ZLIB via find_package(ZLIB). Declaring zlib as a direct dependency records that direct use in the manifest and keeps the build from silently breaking if hdf5's feature set ever drops the transitive pull-in.

* Append each plugin's unit-test target to the SIMPLNX_UNIT_TEST_TARGETS global property in create_simplnx_plugin_unit_test() * Lets consumer builds that include simplnx via add_subdirectory attach additional sources or settings to the test executables without simplnx knowing about the consumer Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

* Replace the per-cell loop over FeatureIds and CellPhases with chunk-sequential copyIntoBuffer reads (bounded 64K-tuple buffers) and a single bulk copyFromBuffer write of the feature-level result * Replace the per-cell std::map lookup with feature-level vectors; warning semantics and output are unchanged (warning set membership is identical under previous-value comparison, and the last phase seen still wins) * Move the cancel check to the per-chunk loop and add throttled progress messaging * Request disk-backed stores in the filter test via PreferencesSentinel so the OOC build exercises the filter against HDF5-backed data * Benchmark (748,800 cells): disk-backed stores 11.7 s -> 35 ms, in-memory stores 8 ms -> 1 ms Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

This reverts commit 1ea72d6.

…base * Pass the ColorKey choice through the ComputeIPFColorsScanline generateIPFColor call so non-TSL color keys reach the OOC dispatch path (the in-core Direct path already forwarded it) * Port the V&V ColorKey plumbing test to the LoadDataStructure API that replaced ImportDataStructureFromFile on this branch * Drop a stale comment reference to the removed ImportDataStructureFromFile in Dream3dIO.cpp Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

* defaultBudgetBytes() returned 50% of RAM without checking maxBudgetBytes(), so on machines under 12 GiB the 6 GiB reserve made the cap smaller than the default (e.g. 7 GiB CI runners: 3.5 GiB default vs 1 GiB cap), failing the cap-and-clamping unit test * The constructor seeds the budget directly from the default, so such machines also ran with an over-cap budget at startup * Clamp the default to the cap; machines with 12 GiB or more are unaffected Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

* Bulk-read FeatureEulerAngles, FeaturePhases, and CrystalStructures into local vectors once via copyIntoBuffer, mirroring ComputeGBCDMetricBased * The parallel triangle selector and the distinct-boundary loop index these arrays randomly by feature id; when they are out-of-core, per-element access turned every triangle into a disk/cache lookup * Eliminates a ~3.3x out-of-core slowdown (20.5s -> 6.3s on the GBPD metric test) to match in-core speed, with no change to computed results Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

MemoryBudgetManager, IParallelAlgorithm, and EmptyDataStore comments named concrete out-of-core implementation types (ChunkCache, stride/partition caches, AbstractOocStore). simplnx core is OOC-agnostic, so describe the behavior in generic terms (cache subsystems, disk-backed stores/arrays) instead. The opaque "HDF5-OOC" registered-format-name references in the DataIOCollection format registry are intentionally left — that is the documented decoupling seam where core handles the format as an opaque string. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>

getBoundingBox() seeded the upper-right corner with std::numeric_limits<float>::min() -- the smallest POSITIVE normal float (~1.18e-38), not the most-negative value. For any node geometry whose maximum coordinate on an axis is <= ~0 (e.g. centered at or below the origin), the max corner never updated during the vertex walk, producing a wrong/oversized bounding box. Geometries lying entirely in positive space happened to work, which masked the bug. Seed the upper corner with std::numeric_limits<float>::lowest() so the max is computed correctly regardless of where the geometry sits. Signed-off-by: Jessica Marquis <jessica.marquis@bluequartz.net>

joeykleingers added the Out-of-Core label Mar 24, 2026

joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from b4ef97f to 99b49ed Compare March 24, 2026 18:13

joeykleingers requested review from JDuffeyBQ, imikejackson, jmarquisbq, mmarineBlueQuartz and nyoungbq March 24, 2026 18:17

joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 2 times, most recently from b4ef97f to bb09048 Compare March 24, 2026 18:51

joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 4 times, most recently from 102c436 to b4c1358 Compare April 2, 2026 00:55

joeykleingers changed the title ~~WIP: OOC architecture rewrite — new bulk I/O API, SimplnxOoc plugin, and filter optimizations~~ ENH: OOC architecture rewrite — new bulk I/O API and infrastructure Apr 2, 2026

joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 6 times, most recently from 2bd614a to 110c054 Compare April 8, 2026 17:41

joeykleingers changed the title ~~ENH: OOC architecture rewrite — new bulk I/O API and infrastructure~~ WIP: ENH: OOC architecture rewrite — new bulk I/O API and infrastructure Apr 8, 2026

joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch 4 times, most recently from 35aecd0 to 3a88bbf Compare April 16, 2026 13:03

joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from bdfed87 to 6fbfc8d Compare April 27, 2026 18:18

joeykleingers and others added 27 commits June 23, 2026 16:09

Fix data archive hashes

9df42ee

Revert "Fix data archive hashes"

6084606

This reverts commit 1ea72d6.

joeykleingers force-pushed the worktree-ooc-architecture-rewrite branch from ed64934 to a56f168 Compare June 25, 2026 17:25

joeykleingers and others added 2 commits June 26, 2026 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Out-of-core architecture rewrite and filter optimizations#1568

ENH: Out-of-core architecture rewrite and filter optimizations#1568
joeykleingers wants to merge 43 commits into
BlueQuartzSoftware:developfrom
joeykleingers:worktree-ooc-architecture-rewrite

joeykleingers commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

joeykleingers commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope of this diff

How to review

Part 1 — Core out-of-core architecture (src/simplnx)

1.1 Bulk-I/O data-store API

1.2 Runtime store-format resolution and IO-manager hooks

1.3 Storage-mode preference (+ legacy migration)

1.4 Memory budget + memory safety

1.5 HDF5 streaming write + compression

1.6 .dream3d loader API

1.7 Core algorithm infrastructure

1.8 Core utility bulk-I/O conversions

Part 2 — Filter optimizations

Optimization patterns

Algorithm structure: in-core + out-of-core variants

SimplnxCore inventory

OrientationAnalysis inventory

Cancel + progress

Part 3 — Performance results

Mesh Generation Filters (full ctest wall-clock, OOC build)

Groups B–E (200³ dataset, filter.execute() only)

Pipeline-Critical Filters (filter.execute() only, OOC build)

OrientationAnalysis Filters (full ctest wall-clock, OOC build)

GBCD Filter Group (full ctest wall-clock)

HDF5 Import + Pole Figure Filters (full ctest wall-clock)

Additional Optimizations (full ctest wall-clock)

Geometry / Mesh / Phase Filters

Part 4 — Test infrastructure

Part 5 — Build system

Part 6 — Documentation

Part 7 — Test data archives

Related PR

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joeykleingers commented Mar 24, 2026 •

edited

Loading

Part 1 — Core out-of-core architecture (`src/simplnx`)

1.6 `.dream3d` loader API